MODAL: A Multilingual Corpus Annotated for Modality
نویسندگان
چکیده
English. We have produced a corpus annotated for modality which amounts to approximately 20,000 words in English, French, and Italian. The annotation scheme is based on the notion of epistemic construction and virtually languageindependent. The annotation is rigorously evaluated by means of a newly developed strategy based on the alignment of the entire epistemic constructions as identified and marked up two annotators. The corpus and the agreement scoring tools are publicly available. Italiano. Presentiamo un corpus multilingue di circa 20,000 parole annotato per modalità epistemica. La procedura di annotazione è guidata dal concetto di costruzione epistemic. La validità dell’annotazione è valutata attraverso una strategia sviluppata per tenere conto della necessità di allineare intere costruzioni identificate da annotatori diversi. Il corpus e gli strumenti per la valutazione dell’annotazione sono resi disponibili.
منابع مشابه
3arif: A Corpus of Modern Standard and Egyptian Arabic Tweets Annotated for Epistemic Modality Using Interactive Crowdsourcing
We present 3arif, a large-scale corpus of Modern Standard and Egyptian Arabic tweets annotated for epistemic modality. To create 3arif, we design an interactive crowdsourcing annotation procedure that splits up the annotation process into a series of simplified questions, dispenses with the requirement for expert linguistic knowledge and captures nested modality triggers and their attributes se...
متن کاملA multilingual corpus for rich audio-visual scene description in a meeting-room environment
In this paper, we present a multilingual database specifically designed to develop technologies for rich audio-visual scene description in meeting-room environments. Part of that database includes the already existing CHIL audio-visual recordings, whose annotations have been extended. A relevant objective in the new recorded sessions was to include situations in which the semantic content can n...
متن کاملModality in Text: a Proposal for Corpus Annotation
We present a annotation scheme for modality in Portuguese. In our annotation scheme we have tried to combine a more theoretical linguistic viewpoint with a practical annotation scheme that will also be useful for NLP research but is not geared towards one specific application. Our notion of modality focuses on the attitude and opinion of the speaker, or of the subject of the sentence. We valida...
متن کاملBrowsing Multilingual Information with the MultiSemCor Web Interface
Parallel and comparable corpora represent a crucial resource for different Natural Language Processing tasks like machine translation, lexical acquisition, and knowledge structuring but are also suitable to be consulted by humans for different purposes, such as linguistic teaching, corpus linguistics, translation studies, lexicography, multilingual information browsing. To enhance their exploit...
متن کاملSemEval-2013 Task 12: Multilingual Word Sense Disambiguation
This paper presents the SemEval-2013 task on multilingual Word Sense Disambiguation. We describe our experience in producing a multilingual sense-annotated corpus for the task. The corpus is tagged with BabelNet 1.1.1, a freely-available multilingual encyclopedic dictionary and, as a byproduct, WordNet 3.0 and the Wikipedia sense inventory. We present and analyze the results of participating sy...
متن کامل